High Performance Computing

chapter

The generation of optimized codes using nonzero structure analysis

Bret A. Marsolf, Aart J. C. Bik, Kyle A. Gallivan, Harry A. G. Wijshoff

Lecture Notes in Computer Science > High Performance Computing > 1-29

In this paper we consider techniques for improving the performance of codes for general sparse problems by extracting both local and global structure information from a sparse matrix instance. This information can be used to improve the performance of the primitives through the utilization of specialized methods for the component parts which result from the matrix decomposition. A calculus is defined...

chapter

On the importance of an end-to-end view of memory consistency in future computer systems

Guang R. Gao, Vivek Sarkar

Lecture Notes in Computer Science > High Performance Computing > 30-41

The main purpose of a memory consistency model is to serve as an agreement between hardware system designers and software developers on the semantics of memory operations so as to ensure correct execution of user programs. However, the bulk of past work on memory consistency models has been pursued from the hardware viewpoint. In this viewpoint, a memory consistency model is used to specify certain...

chapter

High performance distributed object systems

Dennis Gannon

Lecture Notes in Computer Science > High Performance Computing > 42-50

This paper will provide a survey of current work on object oriented tools and techniques for metacomputing systems. More specifically, we consider the problem of designing a software component architecture that extends the current emerging desktop object composition models to the domain of high performance networks and massively parallel compute servers.

chapter

Instruction cache prefetching using multilevel branch prediction

Alexander V. Veidenbaum

Lecture Notes in Computer Science > High Performance Computing > 51-70

This paper presents an instruction cache prefetching mechanism capable of prefetching past branches in multiple-issue processors. Such processors at high clock rates often use small instruction caches which have significant miss rates. Prefetching from secondary cache can hide the instruction cache miss penalties but only if initiated sufficiently far ahead of the current program counter. Existing...

chapter

High performance wireless computing

George Cybenko

Lecture Notes in Computer Science > High Performance Computing > 71-71

This session deals with fundamental issues arising in wireless networking and their implications for high performance business, consumer and military computing applications. We will describe the physical basis for wireless networking, mobile IP solutions and challenges as well as ongoing research efforts to make high performance and high confidence computing over wireless networks possible. ...

chapter

High-performance computing and applications in image processing and computer vision

Hamid R. Arabnia

Lecture Notes in Computer Science > High Performance Computing > 72-72

A variety of parallel computer architectures are being used today to cope with the computationally intensive tasks in the areas of image processing and computer vision. Most image processing algorithms can readily exploit SIMD (Single Instruction, Multiple Data Stream) machine architectures. The mapping of these algorithms to such machines is rather straightforward. The fine granularity parallelism...

chapter

Present and future of HPC technologies

Tadashi Watanabe

Lecture Notes in Computer Science > High Performance Computing > 73-74

From the beginning of computer history, there has been strong demand for faster speed of computations, while at the same time there is growing demand for lower cost including lower operating cost and ease-of-use. In order to respond to these demands, there are three major points from the technological aspects though those are not limited to high performance computing area. The state-of-art technologies...

chapter

Evaluation of multithreaded processors and thread-switch policies

Richard J. Eickemeyer, Ross E. Johnson, Steven R. Kunkel, Beng-Hong Lime, more

Lecture Notes in Computer Science > High Performance Computing > 75-90

This paper examines the use of coarse-grained multithreading to lessen the negative impact of memory access latencies on the performance of uniprocessor on-line transaction processing systems. It considers the effect of switching threads on cache misses in a two-level cache system. It also examines several different thread-switch policies. The results suggest that multithreading with a small number...

chapter

A multithreaded implementation concept of prolog on Datarol-II machine

Peter Kacsuk, Makoto Amamiya

Lecture Notes in Computer Science > High Performance Computing > 91-106

The paper presents a massively parallel implementation method of Prolog on the multithreaded parallel machine, Datarol-II. First the Logicflow model is introduced which was developed for implementing Prolog on massively parallel computers. The Logicflow is a dataflow-like graph in which nodes are macro dataflow nodes and tokens represent macrothreads. The Datarol-II architecture efficiently supports...

chapter

Thread Synchronization Unit (TSU): A building block for high performance computers

Paraskevas Evripidou

Lecture Notes in Computer Science > High Performance Computing > 107-118

The Thread Synchronization Unit (TSU) is a hardware mechanism that provides data-driven thread synchronization and data consistency for multi-threaded architectures built with control-flow (i.e. commodity) microprocessors. The TSU design is based on the Decoupled Data-Driven model of execution. This model decouples the synchronization from the computation portions of a program and allows them to execute...

chapter

Data dependence path reduction with tunneling load instructions

Toshinori Sato

Lecture Notes in Computer Science > High Performance Computing > 119-130

The technique for reducing the length of the data dependence path is presented. This technique, named tunneling-load, utilizes the register specifier buffer in order to hide the load latency, and thus reduces the length of the data dependence path. True data dependences can not be removed by any techniques such as register renaming, and are the unavoidable obstacle limiting the instruction level parallelism...

chapter

Performance estimation of embedded software with pipeline and cache hazard modeling

Norbert Imlig, Akihiro Tsutsui

Lecture Notes in Computer Science > High Performance Computing > 131-142

A major challenge in telecommunication design is introducing flexibility while still meeting real-time performance goals. Keeping both flexibility and performance while minimizing cost, leads to mixed hardwaresoftware systems. In the absence of a generic partitioning algorithm, accurate cost and performance modeling become crucial when exploring architectural alternatives. This paper presents a case...

chapter

An implementation and evaluation of a distributed shared-memory system on workstation clusters using fast serial links

Hironori Nakajo, Akihiro Ichikawa, Yukio Kaneda

Lecture Notes in Computer Science > High Performance Computing > 143-158

We summarize an implementation of a distributed sharedmemory system on a workstation cluster. In this paper, we introduce fast serial links called Serial Transparent Asynchronous First-in Firstout Link (STAFF-Link). By using these links we construct a parallel processing system based on the workstation cluster. In the workstation cluster, a distributed shared-memory mechanism is utilized for interprocess...

chapter

Designing and optimizing 3-connectivity communication networks using a distributed genetic algorithm

Jianhua Ma, Runhe Huang, Eiju Tsuboi

Lecture Notes in Computer Science > High Performance Computing > 159-170

In this paper, a distributed genetic algorithm (DGA) for 3-connectivity communication network design is proposed and implemented on a transputer based parallel machine, ParsyTec Gcel-164. It is emphasized that how parallelism can be used with the genetic algorithm. Performance of the (sequential) genetic algorithm (GA) is compared to Dijkstra algorithm (DA) in terms of computation time and total link...

chapter

Adaptive routing on the Recursive Diagonal Torus

A. Funahashi, T. Hanawa, T. Kudoh, H. Amano

Lecture Notes in Computer Science > High Performance Computing > 171-182

Recursive Diagonal Torus, or RDT consisting of recursively structured tori is an interconnection network for massively parallel computers. By adding remote links to the diagonal directions of the torus network recursively, the diameter can be reduced within log ₂N with smaller number of links than that of hypercube. For a a n interconnection network for massively parallel computers, a routing...

chapter

Achieving multi-level parallelization

Carrie J. Brownhill, Alexandru Nicolau, Steve Novack, Constantine D. Polychronopoulos

Lecture Notes in Computer Science > High Performance Computing > 183-194

Many modern machine architectures feature parallel processing at both the fine-grain and coarse-grain level. In order to efficiently utilize these multiple levels; a parallelizing compiler must orchestrate the interactions of fine-grain and coarse-grain transformations. The goal of the PROMIS compiler project is to develop a multi-source, multitarget parallelizing compiler in which the front-end and...

chapter

A technique to eliminate redundant inter-processor communication on parallelizing compiler TINPAR

Atsushi Kubota, Shogo Tatsumi, Toshihiko Tanaka, Masahiro Goshima, more

Lecture Notes in Computer Science > High Performance Computing > 195-204

Optimizing inter-processor(PE) communication is crucial for parallelizing compilers for message-passing parallel machines to achieve high performance. In this paper, we propose a technique to eliminate redundant inter-PE messages. This technique utilizes a data-flow analysis to find a definition point that corresponds to a use point where the definition and the use are occurred in different PEs. If...

chapter

An automatic vectorizing/parallelizing Pascal compiler V-Pascal ver. 3

Tetsutaro Uehara, Yoshitoshi Kunieda, Takao Tsuda

Lecture Notes in Computer Science > High Performance Computing > 205-216

This paper descrives the design and implementation of the automatic vectorizing and paralellizing compiler named V-Pascal Version 3. The compiler is designed as a workbench on which various vectorizing and parallelizing techniques are evaluated. Now this compiler has the ability of vectorizing/parallelizing multiply-nested loops as reduced single loops, vectorizing while-loops and recursive calls,...

chapter

An algorithm for automatic detection of loop indices for communication overlapping

Kazuaki Ishizaki, Hideaki Komatsu, Toshio Nakatani

Lecture Notes in Computer Science > High Performance Computing > 217-230

This paper presents a compiler algorithm that automatically detects the appropriate loop indices of a given nested loop and applies loop interchange and tiling in order to overlap communication with computation. It also describes method of generating communication for the tiled loop on distributed memory machines. The algorithm presented here has been implemented in our High Performance Fortran (HPF)...

chapter

NaraView: An interactive 3D visualization system for parallelization of programs

Mariko Sasakura, Kazuki Joe, Keijiro Araki

Lecture Notes in Computer Science > High Performance Computing > 231-242

For effective use of parallelizing compilers, an interactive environment which allows users to instruct the way of parallelization is needed. As the first step to build such an environment, we have developped a program visualization system named NaraView. The system provides two powerful methods for 3D visualization of program structure and data dependence. 3D visualization of program structure illustrates...

INFONA - science communication portal

High Performance Computing
International Symposium, ISHPC'97 Fukuoka, Japan, November 4–6, 1997 Proceedings

The generation of optimized codes using nonzero structure analysis

On the importance of an end-to-end view of memory consistency in future computer systems

High performance distributed object systems

Instruction cache prefetching using multilevel branch prediction

High performance wireless computing

High-performance computing and applications in image processing and computer vision

Present and future of HPC technologies

Evaluation of multithreaded processors and thread-switch policies

A multithreaded implementation concept of prolog on Datarol-II machine

Thread Synchronization Unit (TSU): A building block for high performance computers

Data dependence path reduction with tunneling load instructions

Performance estimation of embedded software with pipeline and cache hazard modeling

An implementation and evaluation of a distributed shared-memory system on workstation clusters using fast serial links

Designing and optimizing 3-connectivity communication networks using a distributed genetic algorithm

Adaptive routing on the Recursive Diagonal Torus

Achieving multi-level parallelization

A technique to eliminate redundant inter-processor communication on parallelizing compiler TINPAR

An automatic vectorizing/parallelizing Pascal compiler V-Pascal ver. 3

An algorithm for automatic detection of loop indices for communication overlapping

NaraView: An interactive 3D visualization system for parallelization of programs

Filter options

Publication date

Keywords

INFONA - science communication portal

High Performance Computing International Symposium, ISHPC'97 Fukuoka, Japan, November 4–6, 1997 Proceedings $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

High Performance Computing
International Symposium, ISHPC'97 Fukuoka, Japan, November 4–6, 1997 Proceedings